Fetching Sorted Data using Python

Because of the nature of distributed data on a grid of computers, data isn't always organized in a way that is most useful. In many cases, you want the data to be displayed in an ordered form. To do this, you'll want to use the sortby= parameter of the fetch action. There is also a Pandas DataFrame-like way of fetching data in an ordered frorm.

We first need to start with a CAS connection.


In [1]:
import swat

conn = swat.CAS(host, port, username, password)

We need some data to work with, so we'll upload a small data set.


In [2]:
cars = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv')
cars


Out[2]:
CASTable('_T_8B0BFFFC_7FF29B2410C0', caslib='CASUSERHDFS(kesmit)')

Using the fetch action, we can bring down a sample of the data.


In [3]:
cars.fetch(to=5)


Out[3]:
§ Fetch
Selected Rows from Table _T_8B0BFFFC_7FF29B2410C0
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Acura MDX SUV Asia All 36945.0 33337.0 3.5 6.0 265.0 17.0 23.0 4451.0 106.0 189.0
1 Acura 3.5 RL 4dr Sedan Asia Front 43755.0 39014.0 3.5 6.0 225.0 18.0 24.0 3880.0 115.0 197.0
2 Audi A41.8T convertible 2dr Sedan Europe Front 35940.0 32506.0 1.8 4.0 170.0 23.0 30.0 3638.0 105.0 180.0
3 Audi A6 3.0 4dr Sedan Europe Front 36640.0 33129.0 3.0 6.0 220.0 20.0 27.0 3561.0 109.0 192.0
4 Audi A6 2.7 Turbo Quattro 4dr Sedan Europe All 42840.0 38840.0 2.7 6.0 250.0 18.0 25.0 3836.0 109.0 192.0

elapsed 0.016s · user 0.029s · sys 0.031s · mem 9.9MB

To bring the data down in a sorted form, we use the sortby= parameter. This parameter can take a list of column names, or a list of dictionaries with the keys 'name', 'order', and 'formatted'. The 'name' key is required. The 'order' parameter can be set to 'ascending' or 'descending'. The 'formatted' parameter can be set to 'raw' or 'formatted' to sort the column based on the formatted value or the raw value; by default, it sorts by raw value. Let's start with just using column names.


In [4]:
cars.fetch(to=5, sortby=['Cylinders', 'EngineSize'])


Out[4]:
§ Fetch
Selected Rows from Table _T_8B0BFFFC_7FF29B2410C0
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Mazda RX-8 4dr manual Sports Asia Rear 27200.0 25179.0 1.3 NaN 238.0 18.0 24.0 3029.0 106.0 174.0
1 Mazda RX-8 4dr automatic Sports Asia Rear 25700.0 23794.0 1.3 NaN 197.0 18.0 25.0 3053.0 106.0 174.0
2 Honda Insight 2dr (gas/electric) Hybrid Asia Front 19110.0 17911.0 2.0 3.0 73.0 60.0 66.0 1850.0 95.0 155.0
3 Honda Civic Hybrid 4dr manual (gas/electric) Hybrid Asia Front 20140.0 18451.0 1.4 4.0 93.0 46.0 51.0 2732.0 103.0 175.0
4 Scion xA 4dr hatch Sedan Asia Front 12965.0 12340.0 1.5 4.0 108.0 32.0 38.0 2340.0 93.0 154.0

elapsed 0.049s · user 0.107s · sys 0.125s · mem 51.5MB

To reverse the order of the EngineSize column, we need to use a dictionary for that element.


In [5]:
cars.fetch(to=5, sortby=['Cylinders', dict(name='EngineSize', order='descending')])


Out[5]:
§ Fetch
Selected Rows from Table _T_8B0BFFFC_7FF29B2410C0
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Mazda RX-8 4dr manual Sports Asia Rear 27200.0 25179.0 1.3 NaN 238.0 18.0 24.0 3029.0 106.0 174.0
1 Mazda RX-8 4dr automatic Sports Asia Rear 25700.0 23794.0 1.3 NaN 197.0 18.0 25.0 3053.0 106.0 174.0
2 Honda Insight 2dr (gas/electric) Hybrid Asia Front 19110.0 17911.0 2.0 3.0 73.0 60.0 66.0 1850.0 95.0 155.0
3 GMC Canyon Z85 SL Regular Cab Truck USA Rear 16530.0 14877.0 2.8 4.0 175.0 18.0 25.0 3351.0 111.0 192.0
4 Chevrolet Colorado Z85 Truck USA All 18760.0 17070.0 2.8 4.0 175.0 18.0 23.0 3623.0 111.0 192.0

elapsed 0.0457s · user 0.101s · sys 0.11s · mem 51.5MB

DataFrame-style Sorting

As with most elements of the CASTable object, you can also apply sorting behaviors just like you do with Pandas DataFrames. After you do this, each time the data is fetched from the server, it will automatically apply the sorting options for you. This gives the appearance of a sorted CASTable object.

The CASTable object has a sort_values method (it also supports the older sort method name). The sort_values method takes a list of column names as well as optional ascending= and inplace= options. The ascending= option takes a boolean (for all columns) or a list of booleans (one for each column) to indicate whether it should be sorted in ascending or descending order. The inplace= option indicates whether the sort options should be applied to the CASTable that sort_value is called on, or if it should return a copy of the CASTable object with the sort options applied. Note that no copying or sorting is done when the sort_value option is used. The sorting only occurs when data is being brought back to the client side either through table.fetch directly, or through any of the other methods that use fetch in the background (e.g., head, tail)

Let's apply some sort options to our cars object.


In [6]:
sorted_cars = cars.sort_values(['Cylinders', 'EngineSize'])
sorted_cars.fetch(to=5)


Out[6]:
§ Fetch
Selected Rows from Table _T_8B0BFFFC_7FF29B2410C0
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Mazda RX-8 4dr manual Sports Asia Rear 27200.0 25179.0 1.3 NaN 238.0 18.0 24.0 3029.0 106.0 174.0
1 Mazda RX-8 4dr automatic Sports Asia Rear 25700.0 23794.0 1.3 NaN 197.0 18.0 25.0 3053.0 106.0 174.0
2 Honda Insight 2dr (gas/electric) Hybrid Asia Front 19110.0 17911.0 2.0 3.0 73.0 60.0 66.0 1850.0 95.0 155.0
3 Honda Civic Hybrid 4dr manual (gas/electric) Hybrid Asia Front 20140.0 18451.0 1.4 4.0 93.0 46.0 51.0 2732.0 103.0 175.0
4 Scion xA 4dr hatch Sedan Asia Front 12965.0 12340.0 1.5 4.0 108.0 32.0 38.0 2340.0 93.0 154.0

elapsed 0.0556s · user 0.113s · sys 0.12s · mem 51.5MB

Now let's fetch some data using the head DataFrame method. You'll see that the data is still coming back in sorted order.


In [7]:
sorted_cars.head()


Out[7]:
Selected Rows from Table _T_8B0BFFFC_7FF29B2410C0
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Mazda RX-8 4dr manual Sports Asia Rear 27200.0 25179.0 1.3 NaN 238.0 18.0 24.0 3029.0 106.0 174.0
1 Mazda RX-8 4dr automatic Sports Asia Rear 25700.0 23794.0 1.3 NaN 197.0 18.0 25.0 3053.0 106.0 174.0
2 Honda Insight 2dr (gas/electric) Hybrid Asia Front 19110.0 17911.0 2.0 3.0 73.0 60.0 66.0 1850.0 95.0 155.0
3 Honda Civic Hybrid 4dr manual (gas/electric) Hybrid Asia Front 20140.0 18451.0 1.4 4.0 93.0 46.0 51.0 2732.0 103.0 175.0
4 Scion xA 4dr hatch Sedan Asia Front 12965.0 12340.0 1.5 4.0 108.0 32.0 38.0 2340.0 93.0 154.0

Since we didn't use the inplace= option, the original CASTable object should still bring data back in an unsorted order.


In [8]:
cars.head()


Out[8]:
Selected Rows from Table _T_8B0BFFFC_7FF29B2410C0
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Acura MDX SUV Asia All 36945.0 33337.0 3.5 6.0 265.0 17.0 23.0 4451.0 106.0 189.0
1 Acura 3.5 RL 4dr Sedan Asia Front 43755.0 39014.0 3.5 6.0 225.0 18.0 24.0 3880.0 115.0 197.0
2 Audi A41.8T convertible 2dr Sedan Europe Front 35940.0 32506.0 1.8 4.0 170.0 23.0 30.0 3638.0 105.0 180.0
3 Audi A6 3.0 4dr Sedan Europe Front 36640.0 33129.0 3.0 6.0 220.0 20.0 27.0 3561.0 109.0 192.0
4 Audi A6 2.7 Turbo Quattro 4dr Sedan Europe All 42840.0 38840.0 2.7 6.0 250.0 18.0 25.0 3836.0 109.0 192.0

Now let's try applying a sort order along with the inplace=True option. This will modify the CASTable object directly rather than returning a copy.


In [9]:
cars.sort_values(['Cylinders', 'EngineSize'], ascending=[True, False], inplace=True)
cars.head()


Out[9]:
Selected Rows from Table _T_8B0BFFFC_7FF29B2410C0
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Mazda RX-8 4dr manual Sports Asia Rear 27200.0 25179.0 1.3 NaN 238.0 18.0 24.0 3029.0 106.0 174.0
1 Mazda RX-8 4dr automatic Sports Asia Rear 25700.0 23794.0 1.3 NaN 197.0 18.0 25.0 3053.0 106.0 174.0
2 Honda Insight 2dr (gas/electric) Hybrid Asia Front 19110.0 17911.0 2.0 3.0 73.0 60.0 66.0 1850.0 95.0 155.0
3 GMC Canyon Z85 SL Regular Cab Truck USA Rear 16530.0 14877.0 2.8 4.0 175.0 18.0 25.0 3351.0 111.0 192.0
4 Chevrolet Colorado Z85 Truck USA All 18760.0 17070.0 2.8 4.0 175.0 18.0 23.0 3623.0 111.0 192.0

In [10]:
conn.close()

Conclusion

Dealing with distributed data can sometimes take some getting used to, but hopefully with these tips on how to sort your data when it is brought back to the client, you can retrieve your data in the form that you are looking for.


In [ ]: